智能论文笔记

A smart resource management mechanism with trust access control for cloud computing environment

Sakshi Chhabra , Ashutosh Kumar Singh

分类：人工智能

2022-12-10

The core of the computer business now offers subscription-based on-demand services with the help of cloud computing. We may now share resources among multiple users by using virtualization, which creates a virtual instance of a computer system running in an abstracted hardware layer. It provides infinite computing capabilities through its massive cloud datacenters, in contrast to early distributed computing models, and has been incredibly popular in recent years because to its continually growing infrastructure, user base, and hosted data volume. This article suggests a conceptual framework for a workload management paradigm in cloud settings that is both safe and performance-efficient. A resource management unit is used in this paradigm for energy and performing virtual machine allocation with efficiency, assuring the safe execution of users' applications, and protecting against data breaches brought on by unauthorised virtual machine access real-time. A secure virtual machine management unit controls the resource management unit and is created to produce data on unlawful access or intercommunication. Additionally, a workload analyzer unit works simultaneously to estimate resource consumption data to help the resource management unit be more effective during virtual machine allocation. The suggested model functions differently to effectively serve the same objective, including data encryption and decryption prior to transfer, usage of trust access mechanism to prevent unauthorised access to virtual machines, which creates extra computational cost overhead.

translated by 谷歌翻译

Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media

Yuting Guo , Swati Rajwal , Sahithi Lakamana , Chia-Chun Chiang , Paul C. Menell , Adnan H. Shahid , Yi-Chieh Chen , Nikita Chhabra , Wan-Ju Chao , Chieh-Ju Chao

分类：自然语言处理

2022-12-23

Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.

translated by 谷歌翻译

A Novel Approach For Generating Customizable Light Field Datasets for Machine Learning

Julia Huang , Toure Smith , Aloukika Patro , Vidhi Chhabra

分类：计算机视觉 | 人工智能

2022-12-13

To train deep learning models, which often outperform traditional approaches, large datasets of a specified medium, e.g., images, are used in numerous areas. However, for light field-specific machine learning tasks, there is a lack of such available datasets. Therefore, we create our own light field datasets, which have great potential for a variety of applications due to the abundance of information in light fields compared to singular images. Using the Unity and C# frameworks, we develop a novel approach for generating large, scalable, and reproducible light field datasets based on customizable hardware configurations to accelerate light field deep learning research.

translated by 谷歌翻译

Scalable Modular Synthetic Data Generation for Advancing Aerial Autonomy

Mehrnaz Sabet , Praveen Palanisamy , Sakshi Mishra

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-11-10

Harnessing the benefits of drones for urban innovation at scale requires reliable aerial autonomy. One major barrier to advancing aerial autonomy has been collecting large-scale aerial datasets for training machine learning models. Due to costly and time-consuming real-world data collection through deploying drones, there has been an increasing shift towards using synthetic data for training models in drone applications. However, to increase generalizability of trained policies on synthetic data, incorporating domain randomization into the data generation workflow for addressing the sim-to-real problem becomes crucial. Current synthetic data generation tools either lack domain randomization or rely heavily on manual workload or real samples for configuring and generating diverse realistic simulation scenes. These dependencies limit scalability of the data generation workflow. Accordingly, there is a major challenge in balancing generalizability and scalability in synthetic data generation. To address these gaps, we introduce a modular scalable data generation workflow tailored to aerial autonomy applications. To generate realistic configurations of simulation scenes while increasing diversity, we present an adaptive layered domain randomization approach that creates a type-agnostic distribution space for assets over the base map of the environments before pose generation for drone trajectory. We leverage high-level scene structures to automatically place assets in valid configurations and then extend the diversity through obstacle generation and global parameter randomization. We demonstrate the effectiveness of our method in automatically generating diverse configurations and datasets and show its potential for downstream performance optimization. Our work contributes to generating enhanced benchmark datasets for training models that can generalize better to real-world situations.

translated by 谷歌翻译

Continual Learning with Dependency Preserving Hypernetworks

Dupati Srikar Chandra , Sakshi Varshney , P. K. Srijith , Sunil Gupta

分类：机器学习 | 计算机视觉

2022-09-16

人类在整个生命周期中不断学习，通过积累多样化的知识并为未来的任务进行微调。当出现类似目标时，神经网络会遭受灾难性忘记，在学习过程中跨顺序任务跨好任务的数据分布是否不固定。解决此类持续学习（CL）问题的有效方法是使用超网络为目标网络生成任务依赖权重。但是，现有基于超网的方法的持续学习性能受到整个层之间权重的独立性的假设，以维持参数效率。为了解决这一限制，我们提出了一种新颖的方法，该方法使用依赖关系保留超网络来为目标网络生成权重，同时还保持参数效率。我们建议使用基于复发的神经网络（RNN）的超网络，该网络可以有效地生成层权重，同时允许在它们的依赖关系中。此外，我们为基于RNN的超网络提出了新颖的正则化和网络增长技术，以进一步提高持续的学习绩效。为了证明所提出的方法的有效性，我们对几个图像分类持续学习任务和设置进行了实验。我们发现，基于RNN HyperNetworks的建议方法在所有这些CL设置和任务中都优于基准。

translated by 谷歌翻译

M-MELD: A Multilingual Multi-Party Dataset for Emotion Recognition in Conversations

Sreyan Ghosh , S Ramaneswaran , Utkarsh Tyagi , Harshvardhan Srivastava , Samden Lepcha , S Sakshi , Dinesh Manocha

分类：自然语言处理

2022-03-31

Expression of emotions is a crucial part of daily human communication. Emotion recognition in conversations (ERC) is an emerging field of study, where the primary task is to identify the emotion behind each utterance in a conversation. Though a lot of work has been done on ERC in the past, these works only focus on ERC in the English language, thereby ignoring any other languages. In this paper, we present Multilingual MELD (M-MELD), where we extend the Multimodal EmotionLines Dataset (MELD) \cite{poria2018meld} to 4 other languages beyond English, namely Greek, Polish, French, and Spanish. Beyond just establishing strong baselines for all of these 4 languages, we also propose a novel architecture, DiscLSTM, that uses both sequential and conversational discourse context in a conversational dialogue for ERC. Our proposed approach is computationally efficient, can transfer across languages using just a cross-lingual encoder, and achieves better performance than most uni-modal text approaches in the literature on both MELD and M-MELD. We make our data and code publicly on GitHub.

translated by 谷歌翻译

Study of Feature Importance for Quantum Machine Learning Models

Aaron Baughman , Kavitha Yogaraj , Raja Hebbar , Sudeep Ghosh , Rukhsan Ul Haq , Yoshika Chhabra

分类：机器学习

2022-02-18

预测器重要性是经典和量子机学习（QML）数据预处理管道的关键部分。这项工作介绍了此类研究的第一个研究，其中探索了对QML模型的重要性与其经典的机器学习（CML）等效物进行了对比。我们开发了一种混合量子式体系结构，其中训练了QML模型，并根据现实世界数据集上的经典算法计算特征重要性值。该体系结构已在ESPN幻想足球数据上使用Qiskit StateSvector模拟器和IBM量子硬件（例如IBMQ Mumbai和IBMQ Montreal Systems）实现。即使我们处于嘈杂的中间量子量子（NISQ）时代，物理量子计算结果还是有希望的。为了促进当前量子标尺，我们创建了一个数据分层，模型聚合和新颖的验证方法。值得注意的是，与经典模型相比，量子模型的特征重要性具有更高的变化。我们可以证明等效QML和CML模型通过多样性测量是互补的。 QML和CML之间的多样性表明，两种方法都可以以不同的方式促进解决方案。在本文中，我们关注量子支持向量分类器（QSVC），变分量子电路（VQC）及其经典对应物。 ESPN和IBM幻想足球贸易助理将高级统计分析与沃森发现的自然语言处理相结合，以提供公平的个性化贸易建议。在这里，已经考虑了每个播放器的播放器评估数据，并且可以扩展此工作以计算其他QML模型（例如Quantum Boltzmann机器）的特征重要性。

translated by 谷歌翻译

AequeVox: Automated Fairness Testing of Speech Recognition Systems

Sai Sathiesh Rajan , Sakshi Udeshi , Sudipta Chattopadhyay

分类：机器学习 | 自然语言处理

2021-10-19

自动语音识别（ASR）系统已变得无处不在。它们可以在各种形状因素中找到，在我们的日常生活中越来越重要。因此，确保这些系统公平地与人口的不同亚组是至关重要的。在本文中，我们介绍，AeChevox是评估ASR系统的公平性的自动化测试框架。 Aequevox模拟不同的环境，以评估ASR系统对不同群体的有效性。此外，我们还调查所选择的模拟是否可易于对人类易于理解。我们进一步提出了一种故障定位技术，能够识别对这些不同环境不稳健的单词。 Aequevox的两个组件都能够在没有地面真理数据的情况下运行。我们使用三个不同的商业ASR评估了来自四个不同数据集的equevox。我们的实验表明，非母语，女性和尼日利亚语扬声器分别产生109％，528.5％和156.9％，平均分别比母语，男性和英国米德兰斯扬声器更多。我们的用户学习还揭示了82.9％的模拟（通过语音转换采用）的可理解性评级高于七（十分之一），评级最低为6.78。这进一步验证了AeChevox发现的公平违规行为。最后，我们展示了非强大的单词，如eApevox中体现的故障定位技术所预测的，显示出的错误，而不是所有ASR的预测强大的单词。

translated by 谷歌翻译

Speech Toxicity Analysis: A New Spoken Language Processing Task

Sreyan Ghosh , Samden Lepcha , S Sakshi , Rajiv Ratn Shah

分类：自然语言处理

2021-10-14

毒性言论，也被称为仇恨言论，被认为是今天批评在线社交媒体的重要问题之一。最近关于有毒语音检测的工作受到文本的模型，没有现有的毒性检测从口语中的出口检测。在本文中，我们提出了一种从口语中检测毒性的新口语处理任务。我们介绍了排毒，这是英语演讲的第一个公开的毒性注释数据集，来自各种公开可用的语音数据库，包括超过200万个话语。最后，我们还提供了对毒性注释的语音语料库的分析可以帮助促进E2E模型的发展，更好地捕获语音中的各种韵律线索，从而提高了口语的毒性分类。

translated by 谷歌翻译

Astraea: Grammar-based Fairness Testing

Ezekiel Soremekun , Sakshi Udeshi , Sudipta Chattopadhyay

分类：人工智能 | 机器学习

2020-10-06

软件通常会产生偏置输出。特别地，已知基于机器学习（ML）软件在处理鉴别的输入时产生错误的预测。这种不公平的计划行为可能是由社会偏见引起的。在过去的几年里，亚马逊，微软和谷歌已经提供了产生不公平产出的软件服务，主要是由于社会偏见（例如性别或比赛）。在此类事件中，开发人员被绑定了进行公平测试的任务。公平性测试是挑战性的;开发人员任务是产生揭示和解释偏见的歧视性投入。我们提出了一种基于语法的公平测试方法（称为Astraea），它利用无与伦比的语法来产生歧视性投入，以揭示软件系统中的公平违规行为。 Astraea使用概率语法，Astraea还通过隔离观察到的软件偏差原因提供故障诊断。 Astraea的诊断有助于改善ML公平性。 Astraea是在18个软件系统上进行评估，提供三种主要的自然语言处理（NLP）服务。在我们的评估中，Astraea产生了公平违规，率达到约18％。 Astraea产生了超过573K的歧视性测试案例，并违反了102k的公平性。此外，Astraea通过模型再培训将软件公平提高〜76％。

translated by 谷歌翻译